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asymptotically (for number of entities N oo) to the Kullback-Leibler cross-entropy Dkl] for 
equiprobable categories in a system, H converges to the Shannon entropy Hsh- However, in many 
cases W or P is not multinomial and/or does not satisfy an asymptotic limit. Such systems cannot 
meaningfully be analysed with Dkl or Hsh^ but can be analysed directly by MaxProb. This study 
reviews several examples, including (a) no n- asymptotic systems; (b) systems with indistinguishable 
entities (quantum statistics); (c) systems with indistinguishable categories; (d) systems represented 
by urn models, such as "neither independent nor identically distributed" (ninid) sampling; and 
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combinatorial definition of entropy is shown to be of greater importance for "probabilistic inference" 
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1. INTRODUCTION 

The combinatorial oi probabilistic definition of entropy, 
yiven by Boltzmann, is usually written as [TJ [2] : 



(1) 



where S'at is the total thermodynamic entropy of a sys- 
tem, S is the entropy per unit entity, N is the number 
of entities, W is number of occurrences of a specified re- 
alization of the system (its statistical weight) and k is 
the Boltzmann constant. This can be rewritten to give 
dimensionless forms of the entropy and cross-entropy (di- 
rected divergence or negative relative entropy) functions, 
respectively glSlElElIIllHliailOlin]: 
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where P is the probability of a given realization and K is 
a dimensionless constant. Since Inx is monotonic with 
maximisation of H or minimisation of subject to the 
constraints on a system, always yields its "most proba- 
ble" (MaxProb) realization(s), and so can be used to infer 
the properties of the system. If a system is governed by 
the multinomial weight or distribution, respectively: 
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where rii G {NUO} is the occupancy of each category i — 
1, s and qi is its source ("prior") probability, then 
(|3| with K — N~^ converge asymptotically (A/'^oo) [12j 
to the Shannon entropy [13^ or Kullback-Leibler cross- 
entropy functions [Ml [15]: 
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Dkl = - lim — InP^^^t = Y^Pi In — (7) 

i=l 

where Pi = rii/N is the frequency or probability of oc- 
cupancy of the ith category. Eqs. (|6|-([7|) are commonly 
used in the maximum entropy (MaxEnt) or minimum 
cross-entropy (MinXEnt) extremisation methods to infer 
the "least informative" or "most uncertain" distribution 
p* of the system [16l [TTl HI [E] , based on axiomatic jus- 
tifications developed in information theory [13, 20 . 

It is important to recognise, however, that W or F may 
not be multinomial and/or may not satisfy an asymp- 
totic limit. Extremisation methods based on (|6| or 
^ will then give a distribution which is unrepresenta- 
tive of the system, except in special instances. In such 
cases, it is preferable to apply the MaxProb principle 
([2|-([3| directly, to obtain the most probable distribution 
of the system. Of course, it is recognised that in non- 
asymptotic systems (A/'<Coo), the most probable distri- 
bution may not be the only observable distribution; in 
other words, there may be a significant spread around 
the inferred distribution [10]. Furthermore, due to quan- 
tisation effects, the actual realizable MaxProb distribu- 
tion(s) may be sub-optimal [T0|. Despite these effects, 
the MaxProb principle provides a powerful tool for "prob- 
abilistic inference" of the properties of a probabilistic sys- 
tem, irrespective of its form. 
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The aim of this work is to demonstrate the utihty of 
the MaxProb principle in a number of systems 

of physical interest: (a) non-asymptotic systems; (b) sys- 
tems with indistinguishable entities (quantum statistics) ; 
(c) systems with indistinguishable categories; (d) systems 
represented by urn models, e.g. "neither independent nor 
identically distributed" (ninid) sampling; and (e) systems 
representable in graphical form, such as decision trees 
and networks. Definitions of terms are provided in 32j 
following which the above systems are examined in ^3][7[ 
Particular attention is paid to (c), to explore the peculiar 
properties of systems with indistinguishable categories. 
The case studies serve as evidence that Boltzmann's def- 
inition ([2|-([3| is of much greater utility for probabilistic 
inference than the Shannon or Kullback-Leibler functions 
(|6|-([7|) of information theory. 



DEFINITIONS 



Configuration 



7 21 

91 37 92 
4 9 29 31 
21 33 17 99 3 13 
24 1 27 95 25 35 2 



Category 



/=1 

Entity 




UJ... 



Realization [1,2,1, ..,2] 



FIG. 1: Definition of terms used in the combinatorial defini- 
tion of entropy. 

To avoid confusion, it is necessary to define several 
terms, discussed in reference to the combinatorial allo- 
cation scheme ( "ball-in-box" model) shown in Figure [l] 
[5j El [71 m El [ini Hr ; this scheme encompasses both physi- 
cal and mathematical (information-theoretic) interpreta- 
tions. We make the following definitions: 

• An entity m = 1, is a discrete particle, object 
or agent, or an individual selection of a discrete random 
variable, which acts separately but not necessarily in- 
dependently of other entities. 

• A category Ci^i = 1, 5 is a possible assignment of an 
entity (e.g. an energy level, side of a die or alphabetic 
symbol) . Although not shown in Figure [l] categories 
can be degenerate (involving gi subcategories in each 
category i) and/or multivariate (involving a vector in- 
dex i). 

• A probabilistic system is the ordered triple T(/7, C, ^), 
consisting of a finite set of entities U = {iXm}; a finite 
set of categories C = {c^} (possibly a set of multivariate 
degenerate sets) with COU = 0; and a discrete random 
variable : U ^ C. In other words, ^ is a function 
which assigns all entities Um G /7 to selected categories 



c^ e C in accordance with some probabilistic rule (not 
all categories need be selected) . This definition encom- 
passes both physical and mathematical situations. 

• A configuration is an identifiable permutation or pat- 
tern of entities amongst the categories, i.e. a set of as- 
signments {U^C} (in physics, a complexion or micro- 
state; in gambling or informatics, a sequence). A con- 
figuration is thus a property of a system as a whole. 

• A realization is an aggregated arrangement of entities 
amongst the categories of a system, i.e. a set of con- 
figurations {{U C}^^\{U 1^ specified 
by some rule. A common rule is to take the number of 
entities in each category, as specified by the occupancy 
vector or tensor n = {n^} (in physics, a macrostate; in 
informatics, an outcome or type). Realizations are here 
considered mutually exclusive (this requirement could 
be relaxed to give some very different types of systems) . 

• The statistical weight W^^^ of the uth realization n^^^ 
is the number of ways in which it can occur, i.e. its 
number of configurations. 

• The governing probability P^^^ of the uth realization 
n*^^^ is its probability of occurrence, i.e. the sum of 
probabilities of its component configurations. 

Figure [l] shows the allocation of distinguishable entities 
to distinguishable categories, without replacement, un- 
til all N available entities are exhausted (see This 
allocation scheme can be varied in many ways. 

We therefore wish to conduct probabilistic inference, 
i.e. to infer the properties of a probabilistic system 
T(/7, C, ^), using the available information about its set 
of realizations {n^^^} with weights {W^^^} or probabil- 
ities {P^^)}. Two "measures of central tendency" are 
evident: 

• One measure - arguably the most important for infer- 
ring the "typical" behaviour of a system - is the most 
probable (MaxProb) or modal realization [TJ |2l [3l [H [5l 

ElElEllllITolin]: 
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(8) 



Its use depends on the principle that ''A system can be 
represented by its realization of highest probability^'. A 
significant advantage of MaxProb is that it can often 
be found by extremisation or optimisation methods. 
Of course, multimodal distributions will have multiple 
maxima, an inherent aspect of this method [4^ 5 , 8j. 

• Another measure is the mean- weighted, superposi- 
tional or expected occurrence realization (MeanProb), 
in which each realization is weighted by its weight or 
probability [4]: 



n = - = V n^^¥^^^ 

^ — ^ u 



(9) 



This measure is important for non-asymptotic systems 
and those with skewed distributions, but its calculation 
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can become formidable as the number of realizations 
increases (often, an exponential function of N). 

Both MaxProb and MeanProb are independent of any 
information-theoretic or axiomatic considerations, other 
than those of probability theory itself. This is abso- 
lutely essential, since in any contradiction between infor- 
mation theory and probability theory, the latter - being 
more fundamental - must triumph |8]. The two mea- 
sures also do not require asymptotic behaviour, and so 
can be applied to systems with finite numbers of entities 

Whilst this study contains distinct philosophical dif- 
ferences with Jaynes |16| llTl US] over the philosoph- 
ical meaning of the entropy concept, the "subjective 
Bayesian" definition of probabilities - as assignments 
based on what we know - is adopted here. It is also 
recognised that there are many different ways to assign 
entities and categories within a system, and many ways 
to group configurations into realizations, with any par- 
ticular choice being dependent on the observer's pur- 
pose. This leads to the "subjective" (or "observer- 
dependent") interpretation of the entropy concept, a 
viewpoint staunchly defended by Jaynes [16]. This was 
aptly expressed by Tseng and Caticha [21]: 

''Entropy is not a property of a system . . . [it] is a 
property of our description of a system. " 

Different observers (indeed, the same observer), with dif- 
ferent available information and/or different purposes, 
can therefore make different probability and entropy as- 
signments for the same system, leading to different (ra- 
tional) conclusions; this is a necessary feature of proba- 
bilistic inference. The test of validity of such inference is 
the extent of its agreement with observations, responsi- 
bility for which again lies with the observer and his/her 
social cohort. Such sentiments in no way weaken the 
mathematical rigour of the probabilistic method, as set 
out in the following sections, nor the rules of probability 
theory upon which it is based. 



3. NON-ASYMPTOTIC MULTINOMIAL 
SYSTEMS 

We first examine univariate multinomial systems, the 
original application of Boltzmann's principle [1^ J^. From 
a Bayesian perspective, there are many reasons why one 
might (rationally) select the multinomial distribution ([5| 
to represent a system [8 ; it encompasses, but does not 
imply, a "frequentist" approach [161 [III [H] • For maxi- 
mum generality, we include the source or prior distribu- 
tion Qi] in physics, this is often interpreted as the num- 
ber of distinguishable subcategories or degeneracy gi of 
each category i, normalised by the total degeneracy of 
the system G = Yll^iQi pTi. For constant applying 
the combinatorial definition pi) to the multinomial distri- 
bution ([5| (taking K = A/""^) yields the non- asymptotic 



cross-entropy function P 171 18} [TOl [TT] : 

1 1 ^ 

-^ifit = ^InPw* = -{\nN\+Y.[ndnqi-\nni\\]. 

(10) 

Either (10), or InP^^^^^ itself, can be maximised by the 



Lagrangian method subject to the constraints: 

s 

Y,n,=N, (11) 



Y,n,fr^=Fr, r = l,...,R. (12) 



where fri is some function of each category i and Fr is 
its total value, to infer the most probable distribution of 
the system [H [71 [101 E] : 



1 



N N' 
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ln7V! + lng^-Ao-^A^/^ 



(13) 

where is the Lagrangian multiplier associated with the 
rth constraint, A~^{y) = il)~^{y — 1) is the upper inverse 
of the function K{x) = ?/^(x + l) and iIj{x) is the digamma 
function. The Massieu function Aq cannot be factored 
from (13), hence the latter must be solved simultane- 
ously with all constraints (11)-(12). In the asymptotic 
limit N ^ oo^ the above extremisation converges to the 
Boltzmann distribution: 



N 



Qi exp(-Ao - ^ Xrfr' 



r=l 
R 



(14) 



= Z ^Qi eX.p{-^Xrfri) 



where Aq = Aq + 1 and Z = ^^^^ qi exp(- X)r=i Kfri) 
is the partition function. 

The effect of N on the properties of non-asymptotic 
multinomial systems, including (i) the discrepancy be- 
tween inferred MaxProb and Kullback-Leibler MinXEnt 
distributions ( 13)-([l4|), (ii) the spread of realizations 



around the MaxProb distribution, and (iii) the impor- 
tance of quantisation, are examined elsewhere [10]. The 
analyses reveal the importance of N in statistical me- 
chanics. The information-theoretic properties of non- 
asymptotic multinomial systems have also been exam- 
ined, in which the change in "information" is defined as 
the negative change in the non-asymptotic entropy ana- 
logue of ([To]) [c.f. El [231 [21 [SUES], i-e.: 



A/ (bits) = 



" hi2" 



KA In^ 
hi2" 



(15) 



both for binary systems {s = 2) [6 and equiprobable 
systems in general [7] . The analyses show that "informa- 
tion" consists of two parts: one associated with knowl- 
edge of the realization {n^} and the other associated with 
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knowledge of N. Such findings overturn tiie prevailing 
wisdom in communications and information theory, in 
which N is assumed to be infinite and therefore irrele- 
vant [T31I2Q]. 

The MaxProb principle has also been applied to the 
analysis of a non- asymptotic, closed thermodynamic sys- 
tem of non-interacting particles (a double system-bath 
with heat transfer), using the multinomial distribution 
[10]. This shows that in such systems, thermodynamic 
intensive variables such as temperature are well-defined 
at small N and do not require a "thermodynamic limit" 
[To] . This concurs with similar findings by other workers, 
from different perspectives [27 , as well as with common 
practice in engineering analyses of heat transfer [28] . 



4. DISTINGUISHABILITY OF ENTITIES 

Consideration of the effect of indistinguishable entities 
in the 1920s is perhaps the most famous application of 
MaxProb [H EOl |3ll [32 |33] , providing the groundwork 
for the development of quantum theory. This has now 
led to the following four allocation schemes (in physics, 
referred to as "statistics" ) E3l3Ql|3ll|3a[Ml[3l[3ll[36l 
[37l|38l[39l[40|: 

• Maxwell-Boltzmann (MB) statistics, in which distin- 
guishable entities are allocated to distinguishable de- 
generate categories, with no restrictions on the occu- 
pancies; 

• Lynden-Bell (LB) statistics, as for Maxwell-Boltzmann 
statistics but with a maximum of one entity per sub- 
category [MllSiSo]. 

• Bose-Einstein (BE) statistics, in which indistinguish- 
able entities {bosons) are allocated to distinguishable 
degenerate categories, with no restrictions on the occu- 
pancies; and 

• Fermi-Dirac (FD) statistics, as for Bose-Einstein statis- 
tics (involving fermions) but with a maximum of one 
entity per subcategory. 

BE and FD statistics were developed for quantum sys- 
tems, but have found many other applications, e.g. the 
application of FD statistics to the packing of granular 
materials [41]. LB statistics were developed for collision- 
less particle systems, such as gravitational stellar dy- 
namics [38l |39l [40]. The commonly adopted statisti- 
cal weights of these statistics are given in Table [l| [e.g. 
[23|30l|3ll[32l[33|3l[3g[Ml3Zl[3^ Note that 

the MB statistic is multinomial ([5|. Only the simplest, 
univariate version of each statistic is given here; their 
formulation is scrutinised more closely in [11 . 

From the combinatorial definition of entropy ([2| and 
MaxProb principle ([8|, the non-asymptotic and asymp- 
totic entropy functions and most probable distributions 
- calculated subject to the constraints (11)-(12) - are 
listed in Table |lj As with multinomial systems (^, 
the inferred non-asymptotic most probable distribution 
obtained by extremisation may differ from the actual 



(realizable) distribution(s), due to quantisation effects 
[10 . Note that the asymptotic LB and FD distribu- 
tions are identical up to normalisation, although their 
meaning is different [39l [iO^. The BE and FD 
weights converge to Wmb/^^- in highly degenerate sys- 
tems gi^rii, whilst the LB weight converges directly 
to Wmb] in the same limit, the LB, BE and FD en- 
tropies and most probable distributions all converge (up 
to a constant) to those of the MB distribution. From 
the pattern of the weights (Table we can also de- 
fine a distinguishable-entity equivalent of BE statistics 



with weight Wd-.d = N\ 



;, + n,-l)!/(^,-l)!n,!. 



which for Qi^rii also converges to Wmb'-, this does not 
appear to have been examined previously. 

The non-asymptotic BE and FD statistics have im- 
portant information-theoretic implications [6, 7 . Us- 
ing the combinatorial definition of information (15), it is 



shown that the observation of a finite number of bosons 
or fermions requires the input of energy or information; 
from the second law of thermodynamics, this is thermo- 
dynamically irreversible. A single boson or fermion must 
therefore appear to behave as if it were an infinite number 
of entities until its moment of observation. This "infor- 
mation relativity" perspective provides a rational expla- 
nation for the "collapse of the wavefunction" in quantum 
systems, which is not explained by present-day quantum 
theory, and for which many metaphysical justifications 
have been proposed [42 . 

It is also possible to derive intermediate statistics 
which interpolate between BE and FD statistics. Sev- 
eral alternatives are available: 

• Gentile statistics^ which indistinguishable entities are 
allocated to distinguishable categories with restriction 
rii G {0, 1, ...,m} entities per subcategory [191 113 Ull 

ng. 

• Haldane- Wu statistics^ in which entities are allocated to 
categories using a generalised Pauli exclusion principle 
@6l|47]. 

• Acharya-Swamy statistics, proposed by ansatz [48j and 
now with several justifications [47l |49l |50l [51] ; see also 

m 

• Cattani-Fernandes statistics, derived by a combina- 
torics argument using quantum group theory [52l [53l 
[54]. 

Other intermediate statistics have also been proposed. 
Their main application has been to quantum particle sys- 
tems, but curiously, only in the asymptotic limit ^ oo. 
Gentile statistics have also been applied to the analysis 
of socioeconomic and transport systems, again only in 
asymptotic form [e.g. [19]. 



5. DISTINGUISHABILITY OF CATEGORIES 

By logical extension of Q we can also consider the allo- 
cation of (in) distinguishable entities to indistinguishable 
categories. Despite the fact that indistinguishable cat- 
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egories are part of the "folklore" of corabinatorics, and 
are included in published tables of the number of com- 
binations or permutations of different allocation schemes 
(e.g. the "twelve-fold way") [55, 5ll EH El IW , the en- 
tropy functions and most probable distributions of such 
systems have only recently been examined [9 . For con- 
venience, we define: 

• D:I statistics^ in which N distinguishable entities are 
allocated to s indistinguishable categories; 

• /.'/ statistics^ in which N indistinguishable entities are 
allocated to s indistinguishable categories. 

The D:I case has been examined for univariate, non- 
degenerate and equally degenerate categories [9 , whilst 
the LI case has not previously been examined. In the 
following, the non-degenerate forms of each statistic are 
discussed in detail, followed by their equally degenerate 
forms. 



5.1. Non-Degenerate D:I and 1:1 Statistics 
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FIG. 2: Allocation schemes for non-degenerate indistinguish- 
able categories: (a) D:I statistic and (b) 1:1 statistic. 

Firstly examining the non-degenerate D:I statistic 

illustrated in Figure |2|a), we denote the weight of each 
realization {n^} by: 



W^^^ = {U,...,n^,0,...,o}} 



(16) 



where /c < 5 is the number of filled categories > 0. By 
combinatorial enumeration of some simple examples, the 
following features emerge ^: 



• Unfilled categories do not affect the weight, i.e. [9]: 

{{ni, . . . ,n/e,0, . . . ,o}} ~ {{ni, . . . ,n/e}} ^^^^ 

• Permutations of the occupancies are meaningless, e.g. 
{1,2,1} and {1,1,2} refer to the same realization |9]. 
This is quite different to multinomial and quantum sys- 
tems (^3][4|, in which permuting the occupancies gen- 
erates different realizations. 

It can be shown that the weight is [9]: 



TV! 



(18) 



where Vj > is the repetitivity^ or number of occurrences 
of integer j in the realization {n^} (without counting ze- 
ros), hence ^j^i^j = k. Proof of (18) follows from 



the successive filling of cells [9 . The weight satisfies 

ElEolISi]: 



{k} ^ {{ni,...,n/e,0,...,o}} 



N 



(19) 



all {rii} 
fixed k 



(20) 



k=l 



k=l all {rii} 
fixed k 



where { ^ } is a Stirling number of the second kind and 
B{N, s) is an incomplete Bell number, equal to the total 
number of configurations [55l [571 [58l [59] . B{N^ s) reduces 
to the usual Bell number [58] for 5 = A^. 

Applying the combinatorial definition Q with K = 
to (18) yields the non-asymptotic entropy 0: 



H 



(AT) 
D:I 



s N 

i=l j=l 

where the In A^! term is brought inside the first sum using 
= N. As evident, finding the asymptotic form 
or extremisation of (21) requires careful handling of the 



{rj}, and is therefore not as straightforward as in clas- 
sical or quantum statistics. For N ^ oo (hence s<^N) 
and ^ oo, Vz, application of the Stirling approximation 
Inm! ^ m\nm — m and the associated limits Vj^oo = 0, 
^oo = k gives, for /c <C oo [9]: 



lim 

N- 



^D:I — 



(22) 



),Vi 



H^^i thus converges to the Shannon entropy (|6| under 
these conditions. Outside of these limits, e.g. for s > A/", 
(22) does not apply, since it is critically dependent on 
Ui (X),Vi, not just on ^ oo [9 . The D:I statis- 
tic thus differs substantially from the multinomial in its 
asymptotic properties. 
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We next examine the non-degenerate 1:1 statistic 
(Figure [2];b)). The weight can be denoted: 



N 

ni, . . . ,n/e,0, . . . ,0. 



(23) 



LI statistics have many features in common with the D:I 
case, e.g. unfihed categories have no effect on the reahza- 
tion or weight, and permutations of the occupancies are 
meaningless. However, by inspection it is readily seen 
that, in the non-degenerate case: 



(24) 



In other words, each realization is equiprobable, render- 
ing the MaxProb principle ineffective; non-degenerate 
BE and FD statistics also exhibit this property (Table 
[l]). Such systems must be examined using the MeanProb 
measure ([9| (in effect, a weighted average MaxProb). For 
completeness, it can also be shown that: 



all {ui] 
fixed k 



N 

ni,...,nk,0, ...,0 



E 1 (25) 



all {ui} 
fixed k 

N 



all {ui} 
fixed k 



m, ...,nfe,0, ...,0, 



(26) 



where Vk{N) is a partition number and V{N) a cumu- 
lative partition number [58 ; the latter gives the total 
number of configurations [55i \57 \ [58 l 159], 

To consider some examples, the MaxProb (where pos- 
sible) and MeanProb realizations of non-degenerate MB, 
BE, D:I and LI systems subject only to the normalisation 
constraint (11), calculated by enumeration of all config- 
urations, are listed in Tables [Tll|IV| for various values of 
s and N. The MB and BE realizations are given as lists 
[ni,...,?!^], whilst the D:I and LI realizations are rep- 
resented as ordered sets {ni > ... > Us} (the order is 
immaterial but convenient). As evident: 

• The non-degenerate MB statistic (Table [ll| is highly 
symmetric, in that the entities try to spread as uni- 
formly as possible over all available categories in both 
the MaxProb and MeanProb distributions. It is also 
strongly asymptotic, in that the MaxProb and Mean- 
Prob distributions converge rapidly to the uniform dis- 
tribution, equivalent to the asymptotic distribution ob- 
tained by maximising the Shannon entropy (|6|. 

• The non-degenerate BE statistic (Table [ll| is also 
highly symmetric and strongly asymptotic to a uniform 
distribution, as shown by its MeanProb distribution. 

• In contrast, the non-degenerate D:I statistic is highly 
asymmetric: its MaxProb distribution has a "staircase" 
appearance, in many cases cascading to a region of un- 
occupied cells, whilst the MeanProb distribution de- 
creases monotonically but remains positive. For s = N^ 
this statistic appears to be inherently non-asymptotic. 



with no obvious convergence of the MaxProb or Mean- 
Prob distributions to any function; they also differ sig- 
nificantly from each other. For s<^N (illustrated by 
5 = 3), the MaxProb and MeanProb distributions con- 
verge slowly towards the uniform distribution, given by 
the Shannon asymptotic form (22). 



The non-degenerate LI statistic is also highly asym- 
metric, even more so than the D:I case; its MeanProb 
distribution decreases monotonically but remains posi- 
tive. It has no evident Shannon-like asymptotic con- 
vergence either for s = N or s <^ N. However, for 
s = N it does exhibit a curious asymptotic form, as 
revealed by the total weighted occupancies Mj.j i = 

^LJ^i^LJ in Table v| these, divided by the total 

give the MeanProb distri- 
bution ([9|. As shown, Mi-j^i converges as ^ oo 
to the sequence 1, 2, 4, 7, 12, 19, 30, 45, 67, 97, 139, ... re- 
arranged in descending order; this is simply the sum 
(from zero) of partition numbers [62j [63] . This leads to 
the following: 

Conjecture: For s = the numerator of the Mean- 
Prob distribution of the non-degenerate LI statistic sat- 
isfies: 

N-i 

lim Mi.j^i= lim y nf%^^ = y 7^(a) (27) 



a=0 



Corollary: For s = the MeanProb distribution of 
the non-degenerate LI statistic satisfies: 



N- 



im n/:/,i = 

N^oo V[N) 



(28) 



No attempt is made to prove these limits here. Conver- 
gence is quite rapid (valid at low A^) towards the small 
end of the sequence {i N). 
Non-degenerate D:I and LI statistics therefore differ 
markedly from MB and BE statistics. Their proper- 



ties are summarised in Table VI For indistinguishable 
categories, it is seen that asymmetry is inherent, whilst 
for distinguishable categories, asymmetry can only arise 
from a non- uniform degeneracy and/or the imposition of 
moment constraints (12). 



5.2. Equally Degenerate D:I and 1:1 Statistics 

Now consider equally degenerate D:I statistics, in 

which each category i contains g equiprobable indistin- 
guishable subcategories. The weight can be denoted [9]: 



^^^^(«) = {{ni,.^.,nj}(,) = 11 : 

^ ^ r?,i 



N 

mi, Usi 



(29) 



nig, Us 



where is the occupancy of subcategory m (hence 
Ylm^i^im — ^i)' Again /c < 5 is the number of filled 
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N 


s 


Non-degen. MB statistic only 






Non-degen. MB and BE statistics 






Actual MaxProb realization(s) [nf] 


Wmb (each) 


Pms (each) 


MeanProb realization [fii] 


1 


1 


[1] 


1 


1 


[1] 


2 


2 


[1, 1] 


2 


0.5 


[1, 1] 


3 


3 


[1, 1, 1] 


6 


0.222222 


[1, 1, 1] 


4 


4 


[1, 1, 1, 1] 


24 


0.093750 


[1, 1, 1, 1] 


5 


5 


[1, 1, 1, 1, 1] 


120 


0.038400 


[1, 1, 1, 1, 1] 


10 


10 


[1, 


3.63E+06 


3.629E-04 


[1, 


20 


20 


[1, 


2.43E+18 


2.320E-08 


[1, 


30 


30 


[1, 


2.65E+32 


1.288E-12 


[1, 


40 


40 


[1, 


8.16E+47 


6.749E-17 


[1, 


50 


50 


[1, 


3.04E+64 


3.424E-21 


[1, 


1 


3 


[1, 0, 0], [0, 1, 0], [0, 0, 1] 


1 


0.333333 


[1/3, 1/3, 1/3] 


2 


3 


[1, 1, 0], [1, 0, 1], [0, 1, 1] 


2 


0.222222 


[2/3, 2/3, 2/3] 


3 


3 


[1, 1, 1] 


6 


0.222222 


[1, 1, 1] 


4 


3 


[1, 1, 2], [1, 2, 1], [2, 1, 1] 


12 


0.148148 


[4/3, 4/3, 4/3] 


5 


3 


[1, 2, 2], [2, 1, 2], [2, 2, 1] 


30 


0.123457 


[5/3, 5/3, 5/3] 


10 


3 


[3, 3, 4], [3, 4, 3], [4, 3, 3] 


4200 


0.071127 


[10/3, 10/3, 10/3] 


20 


3 


[6, 7, 7], [7, 6, 7], [7, 7, 6] 


1.33E+08 


0.038151 


[20/3, 20/3, 20/3] 


30 


3 


[10, 10, 10] 


5.55E+12 


0.026961 


[10, 10, 10] 


40 


3 


[13, 13, 14], [13, 14, 13], [14, 13, 13] 


2.41E+17 


0.019853 


[40/3, 40/3, 40/3] 


50 


3 


[16, 17, 17], [17, 16, 17], [17, 17, 16] 


1.15E+22 


0.016005 


[50/3, 50/3, 50/3] 



TABLE II: MaxProb and MeanProb realizations for non- degenerate MB and BE statistics, subject to (11) (in part after ^ 



categories. The weight and entropy are obtained as [9 : 

k inin{g,ni) 

iV 1 

^D:I(g) 



H 



(AT) _ 

D:l{g) - TV 
N 

N 



(n'>.0(n''.0'"' 

lE(^lniV!-lnn,! + ln ^ W 

i=l ^ 7=1 

1 ^ 



(31) 



where 7 is an index of filled subcategories. Details of the 
derivation of (|30| are given in [9 . For N ^ 00^ Ui ^ 
00, Vz, lim^^ooT^ } = tt'^/tt! [64 , rj^oo = and Too 
/c <C 00, ( [31] ) converges to the MB-like entropy H]j.j(^g^ = 

— X]i=i Pi ^^Pi/lf^ where { ^# } is the dominant term in 
the sum over 7. Outside these limits, this asymptotic 
form is not obtained. 

For equally degenerate 1:1 statistics, the weight 
can be denoted by: 



^I:I{g) 



N 

LLni, ... ,715 



N 

nil, nsi 



nig, nsg 



(32) 



By enumeration of numerous examples, it can be estab- 
lished that the weight is given by: 



min(^,j) 



^I:I{g) 



(33) 



7=1 



where i^(a + 6 + ...)"^ is the Wronski aleph function [65l 
[66|, a combinatorial polynomial or complete symmetric 



function [67l [68] given by a multinomial expansion with 
its coefficients omitted. For example, consider: 

(a + 6)2 = + 2a6 + 62 

(a + bf = + Sa'^b + 3ab'^ + b^ 

m 

(a + 6)^ = ^(T)a'6^-' 

where (7^) is the binomial coefficient. The Wronski 
forms are: 



^(a + 6)2 
^{a^bf 

b^(a + 6)^ 



ab^b^ 
o?b + ab^ 



and in general: 



7=1 



E 

ti ,t2,---jtr 



a^^ai^ ...a 



r ' 



(34) 



the sum taken over all permutations of > which sat- 
isfy Yl^=i ^7 — Pi"C)of of (33) again proceeds from the 



successive filling of subcategories. An upper bound for 
the weight is given by the product, over all filled cate- 
gories, of the number of subrealizations of entities in 
7 subcategories; from (25), the latter is given by: 



7v ij ^ LLnii, ...,ni^,0, ...,0JJ ^ 



all {nim} 
fixed 7 



1 



all {nim} 
fixed 7 

(35) 
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N 



Non- degenerate D:I statistic 







Actual MaxProb realization(s) 


Wd:/ 


Pd:I 


MeanProb reahzation {rii} 






{nf} 


(each) 


(each) 




1 


1 


{1} 


1 


1 


{1} 


2 


2 


{1, 1}, {2,0} 


1 


0.5 


{1.5, 0.5} 


3 


3 


{2, 1, 0} 


3 


0.6 


{2, 0.8, 0.2} 


4 


4 


{2, 1, 1, 0} 


6 


0.4 


{2.333, 1.133, 0.467, 0.067} 


5 


5 


{2, 2, 1, 0, 0} 


15 


0.288462 


{2.615, 1.462, 0.692, 0.212, 0.019} 


6 


6 


{3, 2, 1, 0, 0, 0} 


60 


0.295567 


{2.842, 1.759, 0.916, 0.399, 0.079, 4.93E-03} 


7 


7 


{3, 2, 1, 1, 0, 0, 0} 


210 


0.239453 


{3.058, 1.981, 1.166, 0.584, 0.185, 0.025, 1.14E-03} 


8 


8 


{3, 2, 2, 1, 0, 0} 


840 


0.202899 


{3.245, 2.173, 1.417, 0.761, 0.325, 0.071, 7.00E-03, 
2.42E-04} 


9 


9 


{3, 2, 2, 1, 1, 0, 0} 


3780 


0.178749 


{3.419, 2.337, 1.643, 0.949, 0.477, 0.149, 0.024, 
1.75E-03, 4.73E-05} 


10 


10 


{3,2,2,1,1, 1,0,. ..,0}, 
{3,2,2,2, 1,0,. ..,0}, 
{3,3,2,1, 1,0,. ..,0}, 
{4, 3, 2, 1,0,..., 0,0,0} 


12600 


0.108644 


{3.576, 2.494, 1.827, 1.154, 0.629, 0.254, 0.058, 
6.86E-03, 3.97E-04, 8.62E-06} 


20 


20 


{4,3,3,2,2,2,2,1, 1,0,. ..,0}, 
{4,4,3,3,2,2,1,1,0,...,0} 


1.83E+12 


0.035443 


{4.677, 3.623, 2.999, 2.479, 2.046, 1.666, 1.169, 
0.729, 0.395, 0.160, 0.046, 9.26E-03, 1.93E- 
14} 


30 


30 


{5,4,4,3,3,3,2,2,2,1,1, 
0,...,0} 


1.54E+22 


0.018214 


{5.376, 4.330, 3.710, 3.244, 2.880, 2.495, 2.131, 
1.858, 1.507, 1.078, 0.703, 0.406, 0.191, 0.069, 
0.019, 4.17E-03, 5.15E-22, 1.18E-24} 


40 


40 


{5,4,4,4,3,3,3,3,2,2,2,2, 
1,1,1,0,. ..0} 


1.14E+33 


0.007265 


{5.892, 4.848, 4.246, 3.797, 3.405, 3.093, 2.832, 
2.508, 2.181, 1.946, 1.691, 1.333, 0.952, 0.627, 
0.366, 0.180, 0.072, 0.023, 5.93E-03, 6.35E- 
36} 


50 


50 


{6,5,5,4,4,4,3,3,3,3,2,2, 
2,2,1, 1,0,. ..,0} 


7.40E+44 


0.003986 


{6.304, 5.262, 4.662, 4.225, 3.875, 3.542, 3.238, 
3.016, 2.806, 2.516, 2.214, 1.996, 1.794, 1.505, 
1.152, 0.815, 0.531, 0.307, 0.151, 0.062, 0.021, 
6.00E-03, . . . , 5.38E-48} 


1 


3 


{1, 0, 0} 


1 


1 


{1, 0, 0} 


2 


3 


{1, 1, 0}, {2, 0, 0} 


1 


0.5 


{1.5, 0.5, 0} 


3 


3 


{2, 1, 0} 


3 


0.6 


{2, 0.8, 0.2} 


4 


3 


{2, 1, 1} 


6 


0.428571 


{2.429, 1.143, 0.429} 


5 


3 


{2, 2, 1} 


15 


0.365854 


{2.805, 1.585, 0.610} 


6 


3 


{3, 2, 1} 


60 


0.491803 


{3.246, 1.893, 0.861} 


7 


3 


{3, 2, 2}, {4, 2, 1} 


105 


0.287671 


{3.682, 2.205, 1.112} 


8 


3 


{3, 3, 2}, {4, 3, 1} 


280 


0.255941 


{4.077, 2.592, 1.331} 


9 


3 


{4, 3, 2} 


1260 


0.384029 


{4.505, 2.903, 1.592} 


10 


3 


{5, 3, 2} 


2520 


0.256046 


{4.927, 3.218, 1.855} 


20 


3 


{8, 7, 5} 


9.98E+07 


0.171680 


{8.887, 6.582, 4.531} 


30 


3 


{11, 10, 9} 


5.05E+12 


0.147059 


{12.717, 9.907, 7.376} 


40 


3 


{15, 13, 12} 


2.09E+17 


0.103236 


{16.468, 13.236, 10.297} 


50 


3 


{18, 17, 15} 


1.02E+22 


0.085360 


{20.162, 16.578, 13.261} 



TABLE III: MaxProb and MeanProb reahzations for non-degenerate D:I statistics, subject to (11) (in part after [9]) 



whence: 



^I:Iig) 



< 



n( E ^7(^0 



(36) 



The product of X]™/^'^'^ ■^7(^0 must then be 

modified to account for multiple occurrences of the same 
subrealization(s) in different categories, which are indis- 
tinguishable. This is achieved using the Wronski aleph 
instead of a polynomial product, whereupon (36) yields 
(|33|) □. 



The form of ( 33 ) , based on the integers j rather than 



occupancies n^, is not very amenable for derivation of a 
combinatorial entropy function; further work is needed to 



determine if a more suitable form exists. In its absence, 
the MaxProb and MeanProb distributions can always be 
calculated using (33) by enumeration of all realizations. 



To this point, we have examined the effect of differ- 
ent features of "ball-in-box" allocation schemes (Figure 
[1]), including system size (non-asymptotic effects), vari- 
ous types of degeneracy, (in)distinguishability of the balls 
or boxes and occupancy restrictions. Many more choices 
are possible, e.g. how the configurations should be amal- 
gamated into realizations, other occupancy restrictions 
such as non-empty cells, ordered occupancies, mixtures 
of distinguishability types, etc |55l |56l |57l [58l [59] . Most 
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A'^ s Non-degenerate 1:1 statistic: MeanProb realization {ra} 



{1} 

{1.5, 0.5} 

{2, 0.667, 0.333} 

{2.4, 1, 0.4, 0.2} 

{2.857, 1.143, 0.571, 0.286, 0.143} 

{4.571, 2.262, 1.286, 0.786, 0.476, 0.286, 0.167, 0.095, 0.048, 0.024} 

{7.384, 4.056, 2.603, 1.775, 1.239, 0.879, 0.625, 0.447, 0.316, 0.223, 0.155, 0.107, 0.072, 0.048, 0.030, 0.019, 0.011, 
0.006, 3.19E-03, 1.59E-03} 

{9.736, 5.628, 3.795, 2.714, 1.998, 1.496, 1.131, 0.860, 0.655, 0.499, 0.380, 0.288, 0.218, 0.164, 0.122, 0.091, 0.067, 
0.049, 1.78E-04} 

{11.826, 7.059, 4.903, 3.608, 2.735, 2.111, 1.647, 1.295, 1.023, 0.810, 0.642, 0.509, 0.403, 0.318, 0.251, 0.198, 
0.155, 0.121, 2.68E-05} 

{13.736, 8.390, 5.947, 4.462, 3.450, 2.717, 2.165, 1.739, 1.404, 1.138, 0.924, 0.752, 0.612, 0.498, 0.405, 0.329, 
0.267, 0.216, 0.174, 0.141, 0.113, . . . , 4.90E-06} 



1 


1 


o 
z 


o 
z 


3 


3 


4 


4 


5 


5 


ID 


1 n 
lU 


20 


20 


30 


30 


40 


40 


50 


50 


1 


3 


2 


3 


3 


3 


4 


3 


5 


3 


10 


3 


20 


3 


30 


3 


40 


3 


50 


3 



{1, 0, 0} 
{1.5, 0.5, 0} 
{2, 0.667, 0.333} 
{2.75, 1, 0.25} 
{3.4, 1.2, 0.4} 
{6.429, 2.643, 0.929} 
{12.545, 5.409, 2.045} 
{18.626, 8.187, 3.187} 
{24.773, 10.955, 4.273} 
{30.885, 13.731, 5.385} 



TABLE IV: MeanProb realizations for non- degenerate 1:1 statistics, subject to (11) 



N 



Non-degenerate 1:1 statistic: Total weighted occupancies Mj-j^i — 



1 


1 


1 


































2 


2 


3, 


1 
































3 


3 


6, 


2, 


1 






























4 


4 


12, 


5, 


2, 


1 




























5 


5 


20, 


8, 


4, 


2, 


1 


























6 


6 


35, 


16, 


8, 


4, 


2, 


1 
























7 

8 


7 

8 


54, 
86, 


24, 
41, 


13, 
22, 


7, 
13, 


4, 
7, 


2, 
4, 


1 

2, 


1 




















9 


9 


128, 


61, 


35, 


20, 


12, 


7, 


4, 


2, 


1 


















10 


10 


192, 


95, 


54, 


33, 


20, 


12, 


7, 


4, 


2, 


1 
















11 


11 


275, 


136, 


80, 


49, 


31, 


19, 


12, 


7, 


4, 


2, 


1 














12 


12 


399, 


204, 


121, 


76, 


48, 


31, 


19, 


12, 


7, 


4, 


2, 


1 












13 


13 


556, 


284, 


172, 


109, 


71, 


46, 


30, 


19, 


12, 


7, 


4, 


2, 


1 










14 


14 


780, 


407, 


247, 


160, 


105, 


70, 


46, 


30, 


19, 


12, 


7, 


4, 


2, 


1 








15 


15 


1068, 


560, 


347, 


225, 


151, 


101, 


68, 


45, 


30, 


19, 


12, 


7, 


4, 


2, 


1 






16 


16 


1463, 


779, 


484, 


320, 


215, 


147, 


100, 


68, 


45, 


30, 


19, 


12, 


7, 


4, 


2, 


1 




17 


17 


1965, 


1050, 


661, 


439, 


300, 


206, 


143, 


98, 


67, 


45, 


30, 


19, 


12, 


7, 


4, 


2, 1 




18 


18 


2644, 


1432, 


906, 


608, 


418, 


292, 


203, 


142, 


98, 


67, 


45, 


30, 


19, 


12, 


7, 


4, 2, 


1 


19 


19 


3498, 


1901, 


1215, 


820, 


570, 


400, 


283, 


199, 


140, 


97, 


67, 


45, 


30, 


19, 


12, 


7, 4, 


2, 


20 


20 


4630, 


2543, 


1632, 


1113, 


777, 


551, 


392, 


280, 


198, 


140, 


97, 


67, 


45, 


30, 


19, 


12, 7, 


4, 



TABLE V: Total weighted occupancies for non- degenerate LI statistics for s = N, subject only to (11 ). 



of these options have not been examined from an entropic 
(inferential) perspective, and warrant further detailed in- 
vestigation. 



6. URN MODELS 

We now consider the use of urn models - related to 
but distinct from "ball-in-box" models - for the math- 



ematical representation of probabilistic systems. Urn 
models have a long history, being employed by Jacob 
Bernoulli and Laplace [18^, and occupying the attention 
of many traditional statisticians during the 20th century 
[e.g. f69l [TOl [TlJ [72]. A simple example is represented in 
Figure [3| in which balls are drawn from an urn containing 
a total of M balls, made up of balls of the zth colour, 
for z = 1, 5. A ball is drawn in accordance with some 
rule, recorded, and then returned to the urn and/or the 
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Distinguishable Entities 


Indistinguishable Entities 


Distinguishable 
Categories 


Non-degenerate MB statistics 

MaxProb and MeanProb 
Highly symmetric 

Strongly asymptotic to uniform distribution 


Non-degenerate BE statistics 

MeanProb only; realizations equiprobable 
Highly symmetric 

Strongly asymptotic to uniform distribution 


Indistinguishable 
Categories 


Non-degenerate D:I statistics 

MaxProb and MeanProb 
Highly asymmetric 

Slowly asymptotic to uniform distribution, s <^ N 
Non-asymptotic for s = N 7 


Non-degenerate 1:1 statistics 

MeanProb only; realizations equiprobable 
Highly asymmetric 

Non-asymptotic to uniform distribution, s <^ N 
Monotonic decreasing asymptote ( 28 ) for s = N 



TABLE VI: Properties of non- degenerate statistics, subject only to (11) 



urn modified in some manner; the sampling is repeated 
until a sample of balls, consisting of rii of each colour, 
is obtained [c.f. |[7T| [72 . The urn model is used to gener- 
ate the probability distribution P of the sampling scheme 
and, usually, its asymptotic behaviour (M oo and/or 
N oo) is examined. Many extraordinarily complicated 
urn models have been devised, involving the conditional 
drawing and/or replacement of ball(s) from a single or 
multiple urns [GQirTO]. 



l=S 




MM. 



i=^ 



out replacement, in the asympotic limits M ^oo, N ^oo 
and N/M^f3 [HI [22]. Two recent studies [iP, 76^ have 
extended these scenarios using the Polya urn model, in 
which the ball is returned after each draw and c balls of 
the same colour are also added |73[ [74[ [75] : 



■ Polya 



m 

i=l 



n 



ini{mi + c) . . . {rrii + {rii — l)c) 
M(M + c)...{M + {N - l)c) 



Urn 



(37) 

This is a closed-form example of "neither independent 
nor identically distributed" sampling, since the probabil- 
ity of drawing a ball of colour i changes (conditionally) 
during sampling. Eqs. ([3| and (37) were then used to de- 
rive the Polya cross-entropy function. This includes MB, 
BE and FD statistics as special cases, and in general 
gives rise to the Acharya-Swamy intermediate statistic 
[11] . It is also shown that extremisation of the Kullback- 
Leibler function ([t]), in a Polya system, infers a distribu- 
tion which asymptotically vanishes and is therefore un- 
representative of the system [76] . 



FIG. 3: Urn model representation of a probabilistic system. 

Although a very old device, the new perspective here 
is that urn models generate a governing probability P 
which can be converted, by Boltzmann's principle (|3|, 
to a cross-entropy function (|3|. One can then apply the 
tools of probabilistic inference, such as the MaxProb and 
MeanProb principles defined in ^ to infer the proper- 
ties of the system. Surprisingly few physicists, math- 
ematicians or information theorists have exploited this 
technique, despite Boltzmann's principle being over 130 
years old [1 . Although it does simplify the calculations, 
it is not necessary that the system be asymptotic; fur- 
thermore, by the use of modern-day optimisation and 
numerical methods, many types of systems can be exam- 
ined, such as those in which P is not in closed form. Many 
quite complicated probabilistic systems involving condi- 
tional probabilities - e.g. Markovian or non-Markovian 
chains, random walks, networks, transport systems and 
games - can therefore be analysed in this manner. 

It is known that MB, BE and FD statistics can be 
constructed by simple urn models, respectively involving 
sampling with replacement, double replacement or with- 



7. GRAPHICAL SYSTEMS 

Finally, we consider systems which can be represented 
in graphical form. Graph theory is one of the mainstays 
of modern-day combinatorics, and there are few proba- 
bilistic systems which cannot be represented in this man- 
ner. As well as graphs (formally defined below), a wide 
range of specialist concepts are available, including trees, 
networks, posets, cycles, chains, lattices and necklaces 
[e.g. "SF^ 371 EH]- As with urn models, the insight here 
is the ability to infer the "typical" properties of the sys- 
tem, for which the MaxProb and (possibly) the Mean- 
Prob principles are eminently suited. These may involve 
the derivation of an entropy or cross-entropy function, 
for extremisation subject to the constraints on the sys- 
tem. Curiously, however, few combinatorial or graph- 
theoretical studies invoke an entropy concept or seek the 
most probable realization of the system; most published 
studies which consider the graph entropy (defined below) 
stem from information theory [e.g. [73 [ZHl [ZSl [iQl [81] . 

We first define several terms [SI [73 |78l [Tl [80l IMl [82] : 
• The non- Cartesian product of two sets A and B is given 
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demonstrated to be equivalent, is [8Ql l8T]: 

^ 1 

H(G, P) = min Vp^ logo - (39) 



89 

FIG. 4: Graph of several Danish letters. 

by AkB = {{a,h}\a G A, 6 G B}, i.e. the set of un- 
ordered pairs {a, 6} taken without repetition. 

• An undirected graph is the ordered triple G = (F, i/j)^ 
consisting of a non-empty finite set of vertices V = 
{vi}^ a finite set of edges E = {ej} with E V = 
and a function t/j : E ^ V xV . In other words, ip maps 
edges Ci to a pair of vertices {vj^Vk}^ without regard 
to order. 

• A simple graph is an undirected graph without single- 
node loops Ci {vj^Vj} or multiple edges = e^. 

• A complete graph is a simple graph in which is sur- 
jective, i.e. all pairs of vertices have an edge. 

• Two complementary graphs G and G have the same 
vertex set V and disjoint edge sets E and E^ such that 
i/j : E U E ^ V gives a complete graph. 

• A colouring or proper colouring of a graph G is a parti- 
tion of the vertex set V into edge-independent disjoint 
sets {colour classes)^ such that every edge joins vertices 
in two different colour classes. 

• The chromatic number x{G) of a graph G is the small- 
est number of classes in any colouring of G. 

• The vertex packing poly tope VP{G) of a graph G is 
the convex hull of the characteristic vectors of stable 
sets of G [81]. 

It is also possible to consider directed graphs or digraphs^ 
in which each edge has a direction [58l [82] ; these are not 
examined further here. 

The graph entropy concept follows from consideration 
of communications signals of length A/", consisting of let- 
ters Vj e V from an alphabet F, represented as ver- 
tices of a graph. If the letters are considered distin- 
guishable, they are made adjacent (joined by an edge). 
As an example, consider the Danish vowels in Figure [4| 
which if scanned by English-language optical character 
recognition software, may exhibit the distinguishability 
relations shown. Its chromatic number x = 3. The 
graph entropy of a simple graph G on the vertex set 
y = {'^1, •••7 ^s}, with corresponding probability distri- 
bution P = {pi, ...,Ps}, is then defined as fn\ l78 | [79] 

H{G, P) = lim sup ^ log2 (x(G^) + l) (38) 

where Gp implies a graph with distribution P and signal 
length A^. A very different but more tractable definition. 



A third definition of (G, P) is based on the mutual 
information [79l[^. From a combinatorial perspective, 
the graph entropy enables the handling of categories with 
"heterogeneous" distinguishability, a superset of the D:I 
statistic analysed herein (Q. It also exhibits several in- 
teresting properties; e.g. the entropies of two complemen- 
tary graphs are additive and equal to that of the complete 
graph [80, SB IBS], in some sense analogous to the addi- 
tive nature of the thermodynamic entropy. The graph 
entropy is, however, exclusively asymptotic {N oo). 

Substantially more research is required on the compat- 
ibility of the definition of graph entropy ([38|)-([39]) with 
Boltzmann's principle, and on the application of proba- 
bilistic inference (e.g. the MaxProb principle) to systems 
represented in graphical form. 



8. CONCLUSIONS 

This study examines probabilistic systems defined by 
T(/7, G, ^), in which entities Um ^ U are mapped to 
categories G G by a probabilistic random variable ^; 
the resulting distinguishable configurations {U ^ C} are 
then grouped into realizations in accordance with some 
aggregation rule. The combinatorial or probabilistic def- 
initions of entropy H and cross-entropy proportional 
respectively to the logarithm of the weight or probability 
of a specified realization ([2|-(|3| ("Boltzmann's princi- 
ple"), are then considered. These are defined so that 
extremisation of H or subject to any constraints, 
always selects the "most probable" (MaxProb) realiza- 
tion(s) of the system ([8|. Another useful measure of cen- 
tral tendency of a system is its mean-weighted (Mean- 
Prob) realization, the average of all realizations weighted 
by their weight or probability [4]. For multinomial sys- 
tems, the combinatorial definitions (|2|-([3| converge to 
the Shannon entropy or Kullback-Liebler cross-entropy 
in the asymptotic limit A^ ^ oo. However, as is made 
clear in this study, many systems may not be multino- 
mial and/or may not have an asymptotic limit. Such sys- 
tems cannot meaningfully be analysed with Dkl or Hsh^ 
but can be analysed directly by MaxProb and/or Mean- 
Prob. This is illustrated by several examples, including 
(a) non-asymptotic systems; (b) systems with indistin- 
guishable entities (quantum statistics); (c) systems with 
indistinguishable categories; (d) systems represented by 
urn models, such as "neither independent nor identically 
distributed" (ninid) sampling; and (e) systems repre- 
sentable in graphical form, such as decision trees and net- 
works. Particular attention is devoted to (c), especially 
to analysis of the LI statistic, including (i) identification 
of an asymptotic form of its non-degenerate MeanProb 
realization, and (ii) derivation of its non-degenerate sta- 
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tistical weight, in terms of partition numbers, coding pa- 
rameters and the Wronski aleph function. The potential 
for significant new research, especiahy in (d) and (e), is 
also highlighted. 

It is shown that the Boltzmann principle ([2|-([3| leads 
to many different entropy or cross-entropy measures for 
different combinatorial systems, united by a common 
(MaxProb) principle ([8| founded in probability theory. 
In contrast, the Shannon and Kullback-Leibler functions 
of information theory - which are often claimed to be 
universal measures of uncertainty applicable to all prob- 
abilistic systems |^ |^ ^3 - do not have such a uni- 
versal foundation. Indeed, in many systems, the distri- 
bution inferred by the Shannon or Kullback-Leibler func- 
tions can be shown to be unrepresentative of the system 

iai7l[3[IOllIIl[23|30l[3ll|3aiM.M. 3^ The 

combinatorial definition of entropy (Boltzmann's princi- 
ple) is therefore of fundamentally greater importance, for 
the purpose of inferring the properties of a probabilistic 



system, than the definitions adopted in information the- 
ory. 
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